Kalman Filter — linear Gaussian state-space model (NumPy + Plotly)#

The Kalman filter is a recursive Bayesian estimator for linear dynamical systems with Gaussian noise.

At each time step it computes the posterior distribution of a hidden (latent) state \(\mathbf{x}_t\) given the observations \(\mathbf{y}_{1:t}\):

  • filtered mean: \(\hat{\mathbf{x}}_{t\mid t} = \mathbb{E}[\mathbf{x}_t \mid \mathbf{y}_{1:t}]\)

  • filtered covariance: \(\mathbf{P}_{t\mid t} = \mathrm{Cov}(\mathbf{x}_t \mid \mathbf{y}_{1:t})\)

This notebook explains:

  • what the algorithm is and why it is a state-space model

  • how it differs from ARIMA-style time-series models

  • where it is used (tracking, finance, control systems)

  • intuition for the predict \u2192 update cycle and uncertainty propagation

  • a low-level NumPy implementation + Plotly visualizations

Prerequisites#

  • Linear algebra: vectors/matrices, transpose, matrix multiplication

  • Gaussians: mean/covariance, conditioning intuition (“posterior = prior + data”)

  • Basic time-series vocabulary (state, observation, noise)

import platform

import numpy as np
import plotly
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(0)

print("Python", platform.python_version())
print("NumPy", np.__version__)
print("Plotly", plotly.__version__)
Python 3.12.9
NumPy 1.26.2
Plotly 6.5.2

1) What the Kalman filter is#

The Kalman filter is an online (streaming) algorithm that updates an estimate as new measurements arrive.

It solves this problem:

A system evolves over time with some randomness, and we observe noisy measurements of it. We want the best estimate of the system’s current (hidden) state.

In the linear + Gaussian case, the posterior distribution remains Gaussian. That means we only need to carry forward two objects:

  • the mean estimate \(\hat{\mathbf{x}}\) (“where we think the state is”)

  • the covariance \(\mathbf{P}\) (“how uncertain we are”)

Under the Kalman filter assumptions, this recursion is optimal among all estimators in the mean-squared-error sense.

2) Why it is a state-space model#

A state-space model describes a process using:

  • a state transition equation: how the hidden state changes over time

  • an observation equation: how measurements are generated from the state

Two key conditional independence (Markov) ideas:

  • State is Markov: \(p(\mathbf{x}_t \mid \mathbf{x}_{0:t-1}) = p(\mathbf{x}_t \mid \mathbf{x}_{t-1})\)

  • Observation depends on current state only: \(p(\mathbf{y}_t \mid \mathbf{x}_{0:t}, \mathbf{y}_{1:t-1}) = p(\mathbf{y}_t \mid \mathbf{x}_t)\)

This separation is why Kalman filtering is so common in tracking and control: you explicitly model both dynamics (physics) and measurement (sensor).

3) Hidden (latent) state vs observed variables#

  • Hidden state (\(\mathbf{x}_t\)): the quantity you care about but cannot directly observe.

  • Observed variable (\(\mathbf{y}_t\)): what you can measure (often noisy and partial).

Example (tracking):

  • State: \(\mathbf{x}_t = [\text{position}_t,\ \text{velocity}_t]^\top\)

  • Observation: \(\mathbf{y}_t = [\text{measured position}_t]\)

You never measure velocity directly, but you infer it because the model links position and velocity over time.

4) Model assumptions (linearity, Gaussian noise)#

The classical Kalman filter is exact for linear Gaussian state-space models.

Assumptions (typical form):

  • Linearity: both dynamics and measurement are linear functions of the state.

  • Gaussian noise: both process noise and measurement noise are Gaussian.

  • Independence: noises are independent across time and independent of the initial state.

  • Known parameters: you know (or choose/tune) the matrices \(\mathbf{F}, \mathbf{H}\) and covariances \(\mathbf{Q}, \mathbf{R}\).

If your system is nonlinear or non-Gaussian, you typically move to variants like the Extended KF (EKF), Unscented KF (UKF), or particle filters.

5) State transition and observation equations (LaTeX)#

A discrete-time linear Gaussian state-space model is:

State transition (dynamics)#

\[ \mathbf{x}_t = \mathbf{F}_t\,\mathbf{x}_{t-1} + \mathbf{B}_t\,\mathbf{u}_t + \mathbf{w}_t, \quad \mathbf{w}_t \sim \mathcal{N}(\mathbf{0},\ \mathbf{Q}_t) \]

Observation model (measurement)#

\[ \mathbf{y}_t = \mathbf{H}_t\,\mathbf{x}_t + \mathbf{v}_t, \quad \mathbf{v}_t \sim \mathcal{N}(\mathbf{0},\ \mathbf{R}_t) \]

Initial state#

\[ \mathbf{x}_0 \sim \mathcal{N}(\hat{\mathbf{x}}_{0\mid 0},\ \mathbf{P}_{0\mid 0}) \]

Step-by-step meaning:

  • \(\mathbf{x}_t \in \mathbb{R}^n\): latent state vector (size \(n\))

  • \(\mathbf{y}_t \in \mathbb{R}^m\): observation vector (size \(m\))

  • \(\mathbf{F}_t\): state transition matrix (how state evolves)

  • \(\mathbf{u}_t\): optional control input (known exogenous action)

  • \(\mathbf{B}_t\): maps control input into the state

  • \(\mathbf{w}_t\): process noise (unmodeled dynamics / disturbances)

  • \(\mathbf{Q}_t\): covariance of process noise (how “wiggly” the dynamics are)

  • \(\mathbf{H}_t\): observation matrix (maps state into measurement space)

  • \(\mathbf{v}_t\): measurement noise (sensor error)

  • \(\mathbf{R}_t\): covariance of measurement noise (sensor quality)

6) Kalman filter equations (prediction + update)#

Notation:

  • \(\hat{\mathbf{x}}_{t\mid t-1}\): estimate at time \(t\) using observations up to \(t-1\) (prediction / prior)

  • \(\hat{\mathbf{x}}_{t\mid t}\): estimate at time \(t\) using observations up to \(t\) (update / posterior)

  • \(\mathbf{P}_{t\mid t-1}\) and \(\mathbf{P}_{t\mid t}\): the corresponding covariances

Prediction step#

\[ \hat{\mathbf{x}}_{t\mid t-1} = \mathbf{F}_t\,\hat{\mathbf{x}}_{t-1\mid t-1} + \mathbf{B}_t\,\mathbf{u}_t \]
\[ \mathbf{P}_{t\mid t-1} = \mathbf{F}_t\,\mathbf{P}_{t-1\mid t-1}\,\mathbf{F}_t^\top + \mathbf{Q}_t \]

Step-by-step:

  • The mean is pushed forward by the dynamics.

  • The covariance is pushed forward and then inflated by \(\mathbf{Q}_t\) to represent new uncertainty introduced by the process.

Update step#

Innovation (measurement residual):

\[ \tilde{\mathbf{y}}_t = \mathbf{y}_t - \mathbf{H}_t\,\hat{\mathbf{x}}_{t\mid t-1} \]

Innovation covariance:

\[ \mathbf{S}_t = \mathbf{H}_t\,\mathbf{P}_{t\mid t-1}\,\mathbf{H}_t^\top + \mathbf{R}_t \]

Kalman gain:

\[ \mathbf{K}_t = \mathbf{P}_{t\mid t-1}\,\mathbf{H}_t^\top\,\mathbf{S}_t^{-1} \]

Posterior mean:

\[ \hat{\mathbf{x}}_{t\mid t} = \hat{\mathbf{x}}_{t\mid t-1} + \mathbf{K}_t\,\tilde{\mathbf{y}}_t \]

Posterior covariance (common form):

\[ \mathbf{P}_{t\mid t} = (\mathbf{I} - \mathbf{K}_t\,\mathbf{H}_t)\,\mathbf{P}_{t\mid t-1} \]

Step-by-step:

  • \(\tilde{\mathbf{y}}_t\) says how surprising the observation is relative to the prediction.

  • \(\mathbf{S}_t\) says how uncertain we expected the measurement to be (prediction uncertainty projected into measurement space + sensor noise).

  • \(\mathbf{K}_t\) is an adaptive weight: when sensors are noisy (large \(\mathbf{R}_t\)), you trust the model more; when dynamics are noisy (large \(\mathbf{Q}_t\)), you trust measurements more.

  • The covariance shrinks after incorporating data, especially in the directions you observe.

7) Prediction vs update intuition (1D version)#

In 1D with a direct observation (\(y_t = x_t + v_t\)), the update looks like:

\[ \hat{x}_{t\mid t} = \hat{x}_{t\mid t-1} + k_t\,(y_t - \hat{x}_{t\mid t-1}), \quad k_t = \frac{p_{t\mid t-1}}{p_{t\mid t-1} + r} \]

This is a weighted average between prediction and measurement:

  • if measurement noise variance \(r\) is large \u2192 \(k_t\) is small \u2192 you mostly keep the prediction

  • if prediction variance \(p_{t\mid t-1}\) is large \u2192 \(k_t\) is large \u2192 you lean toward the measurement

The multivariate Kalman filter is the same idea with matrices.

8) How it differs from ARIMA-style models#

ARIMA-style models (AR, MA, ARMA, ARIMA, SARIMA) are usually presented as models for an observed univariate time series \(y_t\):

  • they model autocorrelation via lagged terms (AR) and lagged shocks (MA)

  • they often assume stationarity after differencing (the “I” part)

  • the core object is the observed series itself

The Kalman filter, in contrast:

  • explicitly separates latent state \(\mathbf{x}_t\) from measurements \(\mathbf{y}_t\)

  • supports multivariate states/observations naturally

  • handles missing observations and irregular measurement patterns cleanly (skip or modify the update)

  • can include control inputs \(\mathbf{u}_t\) (common in control systems)

Important connection:

  • Many ARIMA models can be rewritten as state-space models, and the Kalman filter can then be used to evaluate the likelihood and produce filtered estimates.

  • Conceptually: ARIMA is a model class; the Kalman filter is an inference algorithm for a (linear Gaussian) state-space representation.

9) Where it is used (tracking, finance, control)#

Common application categories:

  • Tracking / navigation / sensor fusion: estimating position/velocity from GPS + IMU, radar/sonar tracking, robotics localization.

  • Control systems: state estimation inside feedback loops (e.g., LQG control: LQR + Kalman filter), industrial process control, aircraft/spacecraft navigation.

  • Finance / econometrics: estimating latent trends/levels, time-varying regression coefficients (dynamic linear models), smoothing noisy signals, nowcasting.

10) Requirements to use the model correctly#

To use a Kalman filter well, you need more than code — you need a reasonable model.

Key requirements:

  • State design: choose \(\mathbf{x}_t\) so the process is approximately Markov (the state should contain the information needed to predict the next state).

  • Correct dimensions: make sure \(\mathbf{F}\) is \(n\times n\), \(\mathbf{H}\) is \(m\times n\), \(\mathbf{Q}\) is \(n\times n\), \(\mathbf{R}\) is \(m\times m\).

  • Noise covariances: \(\mathbf{Q}\) and \(\mathbf{R}\) must be symmetric positive semidefinite (and \(\mathbf{R}\) typically positive definite so \(\mathbf{S}_t\) is invertible).

  • Units + time step: \(\mathbf{F}, \mathbf{Q}, \mathbf{R}\) must correspond to the same sampling interval (e.g., if you change \(\Delta t\), you must adjust them).

  • Observability: measurements must contain enough information to infer the state (otherwise some components remain weakly determined).

  • Outliers / non-Gaussian noise: heavy tails or outliers can break the Gaussian assumption; consider robust filtering or gating.

  • Tuning: \(\mathbf{Q}\) vs \(\mathbf{R}\) encodes how much you trust the dynamics vs measurements; poor tuning leads to lag, overreaction, or divergence.

11) A concrete example: tracking 1D position with noisy sensors#

We’ll simulate a simple system:

  • State: \(\mathbf{x}_t = [p_t, v_t]^\top\) (position + velocity)

  • Dynamics: constant velocity, with random acceleration treated as process noise

  • Observation: we measure position only, with measurement noise

def make_constant_velocity_model(dt: float, sigma_a: float, sigma_z: float):
    """Return (F, H, Q, R) for a 1D constant-velocity model.

    State x_t = [position, velocity]^T.

    We treat unknown acceleration as process noise with std sigma_a.
    We observe position with measurement noise std sigma_z.
    """
    F = np.array([[1.0, dt], [0.0, 1.0]])
    H = np.array([[1.0, 0.0]])

    # Standard constant-velocity discretization with white acceleration noise
    q = sigma_a**2
    Q = q * np.array(
        [[dt**4 / 4.0, dt**3 / 2.0], [dt**3 / 2.0, dt**2]], dtype=float
    )

    R = np.array([[sigma_z**2]], dtype=float)
    return F, H, Q, R


def simulate_lgssm(
    F: np.ndarray,
    H: np.ndarray,
    Q: np.ndarray,
    R: np.ndarray,
    x0: np.ndarray,
    steps: int,
    rng: np.random.Generator,
):
    """Simulate a linear Gaussian state-space model.

    x_t = F x_{t-1} + w_t,  w_t ~ N(0, Q)
    y_t = H x_t + v_t,      v_t ~ N(0, R)
    """
    n = F.shape[0]
    m = H.shape[0]
    x = np.zeros((steps, n), dtype=float)
    y = np.zeros((steps, m), dtype=float)
    x[0] = x0

    for t in range(steps):
        if t > 0:
            w_t = rng.multivariate_normal(mean=np.zeros(n), cov=Q)
            x[t] = F @ x[t - 1] + w_t
        v_t = rng.multivariate_normal(mean=np.zeros(m), cov=R)
        y[t] = H @ x[t] + v_t

    return x, y


dt = 1.0
T = 80

# "True" noise levels used to generate the data
sigma_a_true = 0.5   # process noise std (acceleration)
sigma_z_true = 2.0   # measurement noise std (position sensor)

F, H, Q_true, R_true = make_constant_velocity_model(dt, sigma_a_true, sigma_z_true)

x0_true = np.array([0.0, 1.0])
x_true, y_obs = simulate_lgssm(F, H, Q_true, R_true, x0_true, steps=T, rng=rng)

print("F=\n", F)
print("H=\n", H)
print("Q_true=\n", Q_true)
print("R_true=\n", R_true)
print("x_true shape", x_true.shape, "y_obs shape", y_obs.shape)
F=
 [[1. 1.]
 [0. 1.]]
H=
 [[1. 0.]]
Q_true=
 [[0.0625 0.125 ]
 [0.125  0.25  ]]
R_true=
 [[4.]]
x_true shape (80, 2) y_obs shape (80, 1)

12) Low-level NumPy implementation (no filtering libraries)#

Below is a straightforward Kalman filter implementation that keeps track of:

  • predicted (prior) mean/covariance: \((\hat{\mathbf{x}}_{t\mid t-1}, \mathbf{P}_{t\mid t-1})\)

  • filtered (posterior) mean/covariance: \((\hat{\mathbf{x}}_{t\mid t}, \mathbf{P}_{t\mid t})\)

Implementation notes:

  • It uses np.linalg.solve instead of explicitly computing \(\mathbf{S}^{-1}\).

  • It updates covariance with the Joseph form for better numerical stability.

  • If an observation contains NaN, it skips the update step (useful for missing data).

def kalman_filter(
    y: np.ndarray,
    F: np.ndarray,
    H: np.ndarray,
    Q: np.ndarray,
    R: np.ndarray,
    x0: np.ndarray,
    P0: np.ndarray,
    B: np.ndarray | None = None,
    u: np.ndarray | None = None,
):
    """Run a discrete-time Kalman filter.

    Parameters
    ----------
    y:
        Observations, shape (T, m) or (T,).
    F, H, Q, R:
        State-space matrices.
    x0, P0:
        Initial filtered mean/covariance (t=0).
    B, u:
        Optional control matrix and control sequence (T, k) or (T,).

    Returns
    -------
    dict with keys: x_pred, P_pred, x_filt, P_filt, K, innovation, S
    """
    y = np.asarray(y, dtype=float)
    if y.ndim == 1:
        y = y[:, None]

    F = np.asarray(F, dtype=float)
    H = np.asarray(H, dtype=float)
    Q = np.asarray(Q, dtype=float)
    R = np.asarray(R, dtype=float)
    x_prev = np.asarray(x0, dtype=float)
    P_prev = np.asarray(P0, dtype=float)

    T, m = y.shape
    n = F.shape[0]

    x_pred = np.zeros((T, n), dtype=float)
    P_pred = np.zeros((T, n, n), dtype=float)
    x_filt = np.zeros((T, n), dtype=float)
    P_filt = np.zeros((T, n, n), dtype=float)
    K = np.zeros((T, n, m), dtype=float)
    innovation = np.zeros((T, m), dtype=float)
    S = np.zeros((T, m, m), dtype=float)

    I = np.eye(n)

    for t in range(T):
        # ---- Prediction (prior) ----
        if B is not None and u is not None:
            u_t = u[t]
            x_pr = F @ x_prev + B @ u_t
        else:
            x_pr = F @ x_prev
        P_pr = F @ P_prev @ F.T + Q

        x_pred[t] = x_pr
        P_pred[t] = P_pr

        # ---- Update (posterior) ----
        y_t = y[t]
        if np.any(np.isnan(y_t)):
            # Missing measurement: carry forward the prediction.
            x_po = x_pr
            P_po = P_pr
            K_t = np.zeros((n, m))
            innov = np.full((m,), np.nan)
            S_t = np.full((m, m), np.nan)
        else:
            innov = y_t - (H @ x_pr)
            S_t = H @ P_pr @ H.T + R
            PHt = P_pr @ H.T

            # K = P H^T S^{-1}  (use solve instead of inverse)
            K_t = np.linalg.solve(S_t, PHt.T).T

            x_po = x_pr + K_t @ innov

            # Joseph form: P = (I-KH) P (I-KH)^T + K R K^T
            KH = K_t @ H
            P_po = (I - KH) @ P_pr @ (I - KH).T + K_t @ R @ K_t.T
            P_po = 0.5 * (P_po + P_po.T)  # enforce symmetry

        x_filt[t] = x_po
        P_filt[t] = P_po
        K[t] = K_t
        innovation[t] = innov
        S[t] = S_t

        x_prev, P_prev = x_po, P_po

    return {
        "x_pred": x_pred,
        "P_pred": P_pred,
        "x_filt": x_filt,
        "P_filt": P_filt,
        "K": K,
        "innovation": innovation,
        "S": S,
    }
# Filter setup: deliberately start with uncertainty about the initial state
x0_guess = np.array([0.0, 0.0])
P0_guess = np.diag([10.0**2, 10.0**2])

res = kalman_filter(y_obs, F, H, Q_true, R_true, x0_guess, P0_guess)
x_pred = res["x_pred"]
P_pred = res["P_pred"]
x_filt = res["x_filt"]
P_filt = res["P_filt"]

rmse_pos = float(np.sqrt(np.mean((x_filt[:, 0] - x_true[:, 0]) ** 2)))
rmse_vel = float(np.sqrt(np.mean((x_filt[:, 1] - x_true[:, 1]) ** 2)))
print("RMSE position:", rmse_pos)
print("RMSE velocity:", rmse_vel)
RMSE position: 1.4082456807214128
RMSE velocity: 0.7170998763678297

13) Plotly: true state vs noisy observations vs filtered estimate#

We plot position (observed) along with the filter estimate and a \u00b12\u03c3 uncertainty band.

t = np.arange(T)

pos_true = x_true[:, 0]
pos_obs = y_obs[:, 0]
pos_est = x_filt[:, 0]
pos_std = np.sqrt(P_filt[:, 0, 0])

fig = go.Figure()

fig.add_trace(
    go.Scatter(x=t, y=pos_true, name="True position", line=dict(color="black"))
)
fig.add_trace(
    go.Scatter(
        x=t,
        y=pos_obs,
        name="Noisy observations",
        mode="markers",
        marker=dict(size=6, color="rgba(200,0,0,0.55)"),
    )
)
fig.add_trace(
    go.Scatter(x=t, y=pos_est, name="Filtered estimate", line=dict(color="#1f77b4"))
)

# Uncertainty band
fig.add_trace(
    go.Scatter(
        x=t,
        y=pos_est + 2.0 * pos_std,
        mode="lines",
        line=dict(color="rgba(31,119,180,0.25)"),
        name="+2σ",
        showlegend=False,
    )
)
fig.add_trace(
    go.Scatter(
        x=t,
        y=pos_est - 2.0 * pos_std,
        mode="lines",
        line=dict(color="rgba(31,119,180,0.25)"),
        fill="tonexty",
        fillcolor="rgba(31,119,180,0.15)",
        name="-2σ",
        showlegend=False,
    )
)

fig.update_layout(
    title="Kalman filter on 1D position tracking",
    xaxis_title="Time step",
    yaxis_title="Position",
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0),
    height=450,
)
fig.show()

14) Plotly: evolution of uncertainty (covariance)#

A nice way to see predict vs update is to compare:

  • prior variance (after prediction): \(\mathbf{P}_{t\mid t-1}\)

  • posterior variance (after update): \(\mathbf{P}_{t\mid t}\)

We plot the diagonal entries for position and velocity.

pos_var_pred = P_pred[:, 0, 0]
pos_var_filt = P_filt[:, 0, 0]
vel_var_pred = P_pred[:, 1, 1]
vel_var_filt = P_filt[:, 1, 1]

fig = make_subplots(
    rows=2,
    cols=1,
    shared_xaxes=True,
    subplot_titles=("Position variance", "Velocity variance"),
)

fig.add_trace(
    go.Scatter(x=t, y=pos_var_pred, name="pos var (pred)", line=dict(color="#ff7f0e")),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=t, y=pos_var_filt, name="pos var (filt)", line=dict(color="#1f77b4")),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(x=t, y=vel_var_pred, name="vel var (pred)", line=dict(color="#ff7f0e")),
    row=2,
    col=1,
)
fig.add_trace(
    go.Scatter(x=t, y=vel_var_filt, name="vel var (filt)", line=dict(color="#1f77b4")),
    row=2,
    col=1,
)

fig.update_layout(
    title="Uncertainty evolution (diagonal of covariance)",
    xaxis2_title="Time step",
    yaxis_title="Variance",
    yaxis2_title="Variance",
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0),
    height=520,
)
fig.show()

15) Effect of changing process vs measurement noise#

The Kalman filter’s behavior is governed by the relative size of:

  • \(\mathbf{Q}\) (process noise): how much you believe the dynamics can deviate from the model

  • \(\mathbf{R}\) (measurement noise): how noisy you believe the sensor is

We’ll keep the simulated data fixed, and run the filter with different assumed noise levels to show the tradeoff:

  • larger assumed \(\sigma_a\) (larger \(\mathbf{Q}\)) \u2192 more responsive to measurements, larger uncertainty

  • larger assumed \(\sigma_z\) (larger \(\mathbf{R}\)) \u2192 smoother estimates, measurements down-weighted

sigma_a_grid = [0.1, sigma_a_true, 1.5]
sigma_z_grid = [0.5, sigma_z_true, 5.0]

colors_a = {sigma_a_grid[0]: "#1f77b4", sigma_a_grid[1]: "#2ca02c", sigma_a_grid[2]: "#d62728"}
colors_z = {sigma_z_grid[0]: "#9467bd", sigma_z_grid[1]: "#8c564b", sigma_z_grid[2]: "#e377c2"}

runs_a = {}
for sigma_a in sigma_a_grid:
    _, _, Q_assumed, R_assumed = make_constant_velocity_model(dt, sigma_a, sigma_z_true)
    out = kalman_filter(y_obs, F, H, Q_assumed, R_assumed, x0_guess, P0_guess)
    runs_a[sigma_a] = out

runs_z = {}
for sigma_z in sigma_z_grid:
    _, _, Q_assumed, R_assumed = make_constant_velocity_model(dt, sigma_a_true, sigma_z)
    out = kalman_filter(y_obs, F, H, Q_assumed, R_assumed, x0_guess, P0_guess)
    runs_z[sigma_z] = out

fig = make_subplots(
    rows=2,
    cols=2,
    shared_xaxes=True,
    subplot_titles=(
        "Vary process noise (assumed σ_a)",
        "Vary measurement noise (assumed σ_z)",
        "Position variance vs σ_a",
        "Position variance vs σ_z",
    ),
)

# Context traces (true + observations) only once in the legend
fig.add_trace(
    go.Scatter(x=t, y=pos_true, name="True position", line=dict(color="black")),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=t,
        y=pos_obs,
        name="Observations",
        mode="markers",
        marker=dict(size=5, color="rgba(200,0,0,0.45)"),
    ),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=t,
        y=pos_true,
        name="True position (dup)",
        line=dict(color="black"),
        showlegend=False,
    ),
    row=1,
    col=2,
)
fig.add_trace(
    go.Scatter(
        x=t,
        y=pos_obs,
        name="Observations (dup)",
        mode="markers",
        marker=dict(size=5, color="rgba(200,0,0,0.45)"),
        showlegend=False,
    ),
    row=1,
    col=2,
)

# Vary Q via sigma_a
for sigma_a, out in runs_a.items():
    x_f = out["x_filt"]
    P_f = out["P_filt"]
    fig.add_trace(
        go.Scatter(
            x=t,
            y=x_f[:, 0],
            name=f"estimate (σ_a={sigma_a:g})",
            line=dict(color=colors_a[sigma_a]),
        ),
        row=1,
        col=1,
    )
    fig.add_trace(
        go.Scatter(
            x=t,
            y=P_f[:, 0, 0],
            name=f"var (σ_a={sigma_a:g})",
            line=dict(color=colors_a[sigma_a], dash="dot"),
            showlegend=False,
        ),
        row=2,
        col=1,
    )

# Vary R via sigma_z
for sigma_z, out in runs_z.items():
    x_f = out["x_filt"]
    P_f = out["P_filt"]
    fig.add_trace(
        go.Scatter(
            x=t,
            y=x_f[:, 0],
            name=f"estimate (σ_z={sigma_z:g})",
            line=dict(color=colors_z[sigma_z]),
        ),
        row=1,
        col=2,
    )
    fig.add_trace(
        go.Scatter(
            x=t,
            y=P_f[:, 0, 0],
            name=f"var (σ_z={sigma_z:g})",
            line=dict(color=colors_z[sigma_z], dash="dot"),
            showlegend=False,
        ),
        row=2,
        col=2,
    )

fig.update_xaxes(title_text="Time step", row=2, col=1)
fig.update_xaxes(title_text="Time step", row=2, col=2)
fig.update_yaxes(title_text="Position", row=1, col=1)
fig.update_yaxes(title_text="Position", row=1, col=2)
fig.update_yaxes(title_text="Variance", row=2, col=1)
fig.update_yaxes(title_text="Variance", row=2, col=2)

fig.update_layout(
    title="Tuning intuition: Q vs R",
    legend=dict(orientation="h", yanchor="bottom", y=1.05, xanchor="left", x=0),
    height=720,
)
fig.show()

Pitfalls + diagnostics#

Common issues in practice:

  • Bad \(\mathbf{Q}/\mathbf{R}\) tuning: too small \(\mathbf{Q}\) \u2192 lag / under-react; too small \(\mathbf{R}\) \u2192 chase noise.

  • Wrong state definition: if the state is missing key dynamics, the filter compensates by inflating noise and may still perform poorly.

  • Non-Gaussian noise / outliers: large outliers can produce huge innovations and destabilize the estimate.

  • Numerical issues: keep covariances symmetric PSD; prefer stable updates (Joseph form), and avoid explicit matrix inverses.

Diagnostics you can monitor:

  • innovations \(\tilde{\mathbf{y}}_t\) (should look like zero-mean noise if the model is well calibrated)

  • normalized innovation squared (NIS): \(\tilde{\mathbf{y}}_t^\top \mathbf{S}_t^{-1} \tilde{\mathbf{y}}_t\) (outlier/gating signal)

Exercises#

  1. Extend the example to 2D position with state \([x, y, v_x, v_y]^\top\).

  2. Add a control input (e.g., commanded acceleration) and verify the filter follows it.

  3. Intentionally mis-specify \(\mathbf{Q}\) or \(\mathbf{R}\) and quantify how RMSE changes.

  4. Introduce missing data (set some observations to NaN) and verify the filter performs prediction-only on those steps.

References#

  • R. E. Kalman (1960), A New Approach to Linear Filtering and Prediction Problems

  • Dan Simon, Optimal State Estimation

  • Kevin P. Murphy, Machine Learning: A Probabilistic Perspective (state-space models chapter)